多任务学习经常用于对一组相同功能集的一组相关响应变量进行建模,从而相对于分别处理每个响应变量的方法提高了预测性能和建模精度。尽管多任务学习的潜力比单任务替代方案具有更强大的推理,但该领域的先前工作在很大程度上忽略了不确定性量化。我们在本文中的重点是神经影像学中常见的多任务问题,其目标是了解多个认知任务分数(或其他主题级评估)与从成像收集的脑连接数据之间的关系。我们提出了一个选择性推断以解决此问题的框架,并具有以下灵活性:(i)通过稀疏性惩罚共同确定每个任务的相关协变量,(ii)基于估计的稀疏性在模型中进行有效推理结构体。我们的框架为推理提供了新的有条件过程,基于选择事件的改进,该事件产生了可拖延的选择调整后的可能性。这给出了最大似然推理的估计方程式的近似系统,可通过单个凸优化问题解决,并使我们能够在大约正确的覆盖范围内有效地形成置信区间。我们的选择性推理方法应用于青少年认知大脑发展(ABCD)研究的模拟数据和数据,比常用的替代方案(例如数据拆分)产生了更紧密的置信区间。我们还通过模拟证明,与单任务方法相比,具有选择性推理的多任务学习可以更准确地恢复真实信号。
translated by 谷歌翻译
The stochastic block model (SBM) is a fundamental model for studying graph clustering or community detection in networks. It has received great attention in the last decade and the balanced case, i.e., assuming all clusters have large size, has been well studied. However, our understanding of SBM with unbalanced communities (arguably, more relevant in practice) is still very limited. In this paper, we provide a simple SVD-based algorithm for recovering the communities in the SBM with communities of varying sizes. We improve upon a result of Ailon, Chen and Xu [ICML 2013] by removing the assumption that there is a large interval such that the sizes of clusters do not fall in. Under the planted clique conjecture, the size of the clusters that can be recovered by our algorithm is nearly optimal (up to polylogarithmic factors) when the probability parameters are constant. As a byproduct, we obtain a polynomial-time algorithm with sublinear query complexity for a clustering problem with a faulty oracle, which finds all clusters of size larger than $\tilde{\Omega}({\sqrt{n}})$ even if $\Omega(n)$ small clusters co-exist in the graph. In contrast, all the previous efficient algorithms that makes sublinear number of queries cannot recover any large cluster, if there are more than $\tilde{\Omega}(n^{2/5})$ small clusters.
translated by 谷歌翻译
监督主体组件分析(SPCA)的方法旨在将标签信息纳入主成分分析(PCA),以便提取的功能对于预测感兴趣的任务更有用。SPCA的先前工作主要集中在优化预测误差上,并忽略了提取功能解释的最大化方差的价值。我们为SPCA提出了一种新的方法,该方法共同解决了这两个目标,并从经验上证明我们的方法主导了现有方法,即在预测误差和变异方面都超越了它们的表现。我们的方法可容纳任意监督的学习损失,并通过统计重新制定提供了广义线性模型的新型低级扩展。
translated by 谷歌翻译
Context is vital for commonsense moral reasoning. "Lying to a friend" is wrong if it is meant to deceive them, but may be morally okay if it is intended to protect them. Such nuanced but salient contextual information can potentially flip the moral judgment of an action. Thus, we present ClarifyDelphi, an interactive system that elicits missing contexts of a moral situation by generating clarification questions such as "Why did you lie to your friend?". Our approach is inspired by the observation that questions whose potential answers lead to diverging moral judgments are the most informative. We learn to generate questions using Reinforcement Learning, by maximizing the divergence between moral judgements of hypothetical answers to a question. Human evaluation shows that our system generates more relevant, informative and defeasible questions compared to other question generation baselines. ClarifyDelphi assists informed moral reasoning processes by seeking additional morally consequential context to disambiguate social and moral situations.
translated by 谷歌翻译
Pre-trained language models, despite their rapid advancements powered by scale, still fall short of robust commonsense capabilities. And yet, scale appears to be the winning recipe; after all, the largest models seem to have acquired the largest amount of commonsense capabilities. Or is it? In this paper, we investigate the possibility of a seemingly impossible match: can smaller language models with dismal commonsense capabilities (i.e., GPT-2), ever win over models that are orders of magnitude larger and better (i.e., GPT-3), if the smaller models are powered with novel commonsense distillation algorithms? The key intellectual question we ask here is whether it is possible, if at all, to design a learning algorithm that does not benefit from scale, yet leads to a competitive level of commonsense acquisition. In this work, we study the generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce a novel commonsense distillation framework, I2D2, that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale models as the teacher model by two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-Tomic, that is of the largest and highest quality available to date.
translated by 谷歌翻译
Metric learning aims to learn distances from the data, which enhances the performance of similarity-based algorithms. An author style detection task is a metric learning problem, where learning style features with small intra-class variations and larger inter-class differences is of great importance to achieve better performance. Recently, metric learning based on softmax loss has been used successfully for style detection. While softmax loss can produce separable representations, its discriminative power is relatively poor. In this work, we propose NBC-Softmax, a contrastive loss based clustering technique for softmax loss, which is more intuitive and able to achieve superior performance. Our technique meets the criterion for larger number of samples, thus achieving block contrastiveness, which is proven to outperform pair-wise losses. It uses mini-batch sampling effectively and is scalable. Experiments on 4 darkweb social forums, with NBCSAuthor that uses the proposed NBC-Softmax for author and sybil detection, shows that our negative block contrastive approach constantly outperforms state-of-the-art methods using the same network architecture. Our code is publicly available at : https://github.com/gayanku/NBC-Softmax
translated by 谷歌翻译
Fusing camera with LiDAR is a promising technique to improve the accuracy of 3D detection due to the complementary physical properties. While most existing methods focus on fusing camera features directly with raw LiDAR point clouds or shallow 3D features, it is observed that direct deep 3D feature fusion achieves inferior accuracy due to feature misalignment. The misalignment that originates from the feature aggregation across large receptive fields becomes increasingly severe for deep network stages. In this paper, we propose PathFusion to enable path-consistent LiDAR-camera deep feature fusion. PathFusion introduces a path consistency loss between shallow and deep features, which encourages the 2D backbone and its fusion path to transform 2D features in a way that is semantically aligned with the transform of the 3D backbone. We apply PathFusion to the prior-art fusion baseline, Focals Conv, and observe more than 1.2\% mAP improvements on the nuScenes test split consistently with and without testing-time augmentations. Moreover, PathFusion also improves KITTI AP3D (R11) by more than 0.6% on moderate level.
translated by 谷歌翻译
With advances in deep learning model training strategies, the training of Point cloud classification methods is significantly improving. For example, PointNeXt, which adopts prominent training techniques and InvResNet layers into PointNet++, achieves over 7% improvement on the real-world ScanObjectNN dataset. However, most of these models use point coordinates features of neighborhood points mapped to higher dimensional space while ignoring the neighborhood point features computed before feeding to the network layers. In this paper, we revisit the PointNeXt model to study the usage and benefit of such neighborhood point features. We train and evaluate PointNeXt on ModelNet40 (synthetic), ScanObjectNN (real-world), and a recent large-scale, real-world grocery dataset, i.e., 3DGrocery100. In addition, we provide an additional inference strategy of weight averaging the top two checkpoints of PointNeXt to improve classification accuracy. Together with the abovementioned ideas, we gain 0.5%, 1%, 4.8%, 3.4%, and 1.6% overall accuracy on the PointNeXt model with real-world datasets, ScanObjectNN (hardest variant), 3DGrocery100's Apple10, Fruits, Vegetables, and Packages subsets, respectively. We also achieve a comparable 0.2% accuracy gain on ModelNet40.
translated by 谷歌翻译
Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control often involve dynamic behaviors in various levels; task, model, and layers (or, ML operators) within a model. Such dynamic behaviors are new challenges to the system software in an ML system because the overall system load is unpredictable unlike traditional ML workloads. Also, the real-time processing requires to meet deadlines, and multi-model workloads involve highly heterogeneous models. As RTMM workloads often run on resource-constrained devices (e.g., VR headset), developing an effective scheduler is an important research problem. Therefore, we propose a new scheduler, SDRM3, that effectively handles various dynamicity in RTMM style workloads targeting multi-accelerator systems. To make scheduling decisions, SDRM3 quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. SDRM3 has tunable parameters that provide fast adaptivity to dynamic workload changes based on a gradient descent-like online optimization, which typically converges within five steps for new workloads. In addition, we also propose a method to exploit model level dynamicity based on Supernet for exploiting the trade-off between the scheduling effectiveness and model performance (e.g., accuracy), which dynamically selects a proper sub-network in a Supernet based on the system loads. In our evaluation on five realistic RTMM workload scenarios, SDRM3 reduces the overall UXCost, which is a energy-delay-product (EDP)-equivalent metric for real-time applications defined in the paper, by 37.7% and 53.2% on geometric mean (up to 97.6% and 97.1%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.
translated by 谷歌翻译
Diffusion models have emerged as a powerful tool for point cloud generation. A key component that drives the impressive performance for generating high-quality samples from noise is iteratively denoise for thousands of steps. While beneficial, the complexity of learning steps has limited its applications to many 3D real-world. To address this limitation, we propose Point Straight Flow (PSF), a model that exhibits impressive performance using one step. Our idea is based on the reformulation of the standard diffusion model, which optimizes the curvy learning trajectory into a straight path. Further, we develop a distillation strategy to shorten the straight path into one step without a performance loss, enabling applications to 3D real-world with latency constraints. We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods. On real-world applications such as point cloud completion and training-free text-guided generation in a low-latency setup, PSF performs favorably.
translated by 谷歌翻译